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An  Iterative  Extension  of  Prony's  Method  for 
ARMA  Signal  Modeling* 

Charles  W.  Therrien   and  Carlos  H.  Velasco 
Department  of  Electrical  and  Computer  Engineering 


Abstract 

A  new  iterative  version  of  the  Prony  method  is  presented  and  shown  to  be 
exceptionally  effective  in  finding  ARMA  models  for  acoustic  data  in  the  signal 
domain.  The  method  is  based  on  a  quadratic  type  of  gradient  algorithm  where  it 
is  shown  that  the  gradient  and  Hessian  are  easily  computed  from  the  data.  The 
new  algorithm  is  found  experimentally  to  have  excellent  convergence  behavior. 
The  performance  of  the  algorithm  is  demonstrated  and  compared  to  that  of  the 
basic  Prony  method  and  to  that  of  the  Steiglitz  and  McBride  iterative  prefiltering 
algorithm  on  some  recorded  acoustic  data. 

1      Introduction 

Linear  pole-zero  (ARMA)  models  are  used  for  a  variety  of  applications  in  signal  pro- 
cessing. These  applications  include  speech  and  image  coding,  spectrum  estimation,  and 
others.  Both  deterministic  and  stochastic  methods  for  signal  modeling  have  been  stud- 
ied extensively  in  the  literature.  Typically  deterministic  models  attempt  to  represent 
the  signal  as  the  impulse  response  of  a  suitably  chosen  linear  system  while  stochastic 
methods  attempt  to  model  the  signal  as  the  response  of  the  linear  system  to  white  noise. 
However  in  many  cases  the  procedures  embedded  in  these  methods  (such  as  estimating 
a  correlation  matrix  for  the  data)  are  similar  so  the  philosophical  distinction  between 
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deterministic  and  stochastic  models  is  of  less  significance.  A  review  of  various  methods 
can  be  found  in  many  places  such  as  [1.  2,  3]. 

In  our  application  we  are  interested  in  reproducing  the  time  domain  waveform  as 
accurately  as  possible  (as  measured  e.g.,  by  the  sum  of  squared  errors).  To  do  this 
frequently  requires  modeling  procedures  that  can  produce  nonminimum-phase  transfer 
functions  [4,  5]  and  this  eliminates  most  of  the  stochastic  methods.  Consequently,  it  is 
most  appropriate  here  to  think  of  the  signal  as  being  modeled  as  the  impulse  response 
of  a  linear  system  and  seek  to  minimize  the  sum  of  squared  errors  between  this  impulse 
response  and  the  given  data. 

A  well  known  approach  to  the  modeling  problem  is  Prony's  method,  where  the  data 
is  represented  in  the  time  domain  by  a  sum  of  damped  exponentials  with  appropriate 
weighting  coeffients.  Prony's  method  avoids  the  direct  nonlinear  problem  of  minimizing 
the  sum  of  squared  errors  and  solves  a  pair  of  linear  problems.  However  in  many  cases 
involving  real  data  this  results  in  a  suboptimal  model  as  shown  for  the  acoustic  data 
(corresponding  to  the  ringing  of  a  wrench  hit  on  the  floor)  in  Fig.  1(a).  In  this  paper  we 
present  a  method  for  iteratively  adjusting  the  parameters  (poles  and  zeros)  in  Prony's 
method  to  values  that  further  minimize  the  sum  of  squared  errors.  The  method,  which 
we  call  the  iterative  Prony  method,  produces  a  much  improved  model  as  shown  in  Fig. 
Kb). 

Iterative  methods  for  ARMA  modeling  are  not  new  [6,  7.  8,  9.  10].  Most  stochastic 
methods  involving  maximum  likelihood  solutions  involve  iteration  (see  e.g.,  [6])  and  a 
very  effective  method  known  as  iterative  prefiltering  [7,  8]  is  capable  of  results  similar 
to  the  iterative  Prony  method.  The  latter,  like  man}'  other  methods  (e.g.,  [9]).  uses  the 
polynomial  coefficients  as  the  parameters  and  thus  does  not  have  a  clear  interpretation 
in  terms  of  placement  of  poles  and  zeros.  Iterative  prefiltering  can  also  lead  to  oscillatory 
behavior  in  the  sum  of  squared  errors  while  the  iterative  Prony  method  typically  results 
in  a  monotonic  decrease  of  the  error  norm  and  seems  to  have  better  behavior  when  the 
model  order  is  not  known. 

The  remainder  of  this  paper  is  organized  as  follows.  Section  2  states  the  basic  form  of 
the  modern  Prony  method  and  establishes  our  notation.  Section  3  develops  the  iterative 
Prony  algorithm.  Section  4  gives  an  example  and  comparison  of  performance  on  recorded 
acoustic  data.  Section  5  provides  a  summary  and  conclusions. 
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Figure  1:  Approximately  10  ms  segment  of  the  sound  corresponding  to  the  ringing  of 
a  wrench  hit  on  the  floor,  (a)  Model  produced  by  the  (non-iterative)  Prony  method  (4 
poles  and  3  zeros),  (b)  Model  produced  by  the  iterative  Prony  method  (also  4  poles  and 
3  zeros). 


2      Prony's  Method  of  ARMA  Modeling 

Although  there  are  many  variations  (see  [11,  Chapter  11]),  the  modern  Prony  method 
can  be  thought  of  as  a  method  for  approximating  a  data  sequence  x[n]  by  a  sequence 
x[n]  which  is  the  impulse  response  of  a  rational  linear  system  satisfying  the  difference 
equation 


x[n]  +  a^ln  -  1]  + h  aPx[n  -  P]  =  b0S[n]  +  6^[n  -!]  +  •••  +  bQ6[n  -  Q) 


(1) 


If  the  requirement  x[n]  =  x[n};        n  —  0, 1, Ns  —  1  is  applied  to  (1),  where  Ns  is  the 

length  of  the  data,  and  the  difference  equation  is  evaluated  for  n  =  0, 1, . . . ,  Ars  —  1,  the 
result  is  the  matrix  equation 
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x[l] 
x[2] 


*IQ] 
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x[l] 
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0 
x[0] 


x[Q  -  1]      x[Q  -  2] 
x[Q]  x[Q  -  1] 


x[Q  -  P] 
x[Q-P  +  l] 


x[Ns-l)    x[Ns-2]    x[Ns-3]    ■■■    x[N.-P-l] 
which  can  be  written  in  partitioned  form  as 
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a2 


aP 


bo 


bQ 
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(2) 


a  = 


b 
0 


(3) 


Thus  if  A*s  >  P  +  Q  +  1  a  least  squares  solution  for  the  a,  ( AR)  parameters  can  be  found 
from  the  lower  set  of  partitions,  and  the  upper  set  of  partitions  can  be  used  to  find  the 
bj  (MA)  parameters.  This  is  the  fundamental  idea  underlying  Prony's  method. 

Prony's  method  is  frequently  expressed  directly  in  the  signal  domain.  Instead  of 
solving  (3)  for  the  vector  b,  we  can  instead  find  the  roots  of  the  denomimator  polynomial 
of  the  system  in  (1) 


A(z)  =  1  +  a, 


■l+a2z~2  + 


+  ap 


-P 


(4) 


and  express  the  impulse  response  x[n]  in  terms  of  these  roots.  Specifically  if  r1?  r2, . . .  ,rp 
are  the  roots  of  (4)  (assumed  to  be  distinct)  then  x[n)  can  be  written  as 


x[n]  =  erf  +  c2r^  +  •  •  •  +  cPrnp 


(5) 


Again,  if  we  require  x[n]  =  x[n]\  n  =  0, 1 Ns  —  1,  then  by  evaluating  (5)  for 

n  =  0. 1, . . .  ,  Ns  —  1  we  have  the  matrix  equation 
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which  can  be  solved  in  a  least  squares  sense  to  obtain  the  coefficients  c,.  This  implemen- 
tation of  Prony's  method  has  an  advantage  over  simply  solving  for  b  from  (3)  since  all 
of  the  data  is  used  in  the  computation.  When  (2)  or  (3)  is  used  to  find  the  b3,  only  the 
data  values  from  0  to  Q  are  used  in  the  computation. 

In  the  case  of  multiple  roots  at  the  same  location  a  slight  variation  of  the  same 
procedure  can  be  used.  Suppose,  for  example,  that  r}  is  a  double  root.  In  this  case.  x[n] 
has  the  form 

x[n]  =  dr;  +  c2nr^  +  •  •  •  +  cPrp  (7) 

and  the  matrix  equation  to  solve  for  the  coefficients  becomes 
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This  situation  is  rare  however,  because  computational  errors  and  errors  inherent  to  the 
modeling  method  itself  contribute  to  produce  roots  that  may  be  very  close  to  each  other 
but  not  exactly  at  the  same  location. 

While  Prony's  method  represents  a  clever  scheme  to  separate  the  solution  of  the  AR 
and  MA  parameters  in  the  modeling  problem,  the  separation  is  frequently  achieved  at  the 
cost  of  a  degradation  in  performance,  as  mentioned  in  the  introduction.  It  is  well  known, 
for  example,  that  Prony's  method  does  not  choose  the  AR  parameters  to  minimize  any 


norm  of  the  error  e[n]  =  x[n]  —  x[n]  but  rather  chooses  them  to  minimize  the  norm 
of  a  modified  error  of  linear  prediction  resulting  from  processing  the  original  data  x[n] 
through  the  filter  represented  by  A(z).  As  a  result,  there  is  motivation  to  modify  the 
model  obtained  by  Prony 's  method  through  an  iterative  scheme  that  seeks  to  directly 
minimize  the  norm  of  the  error  e[n].  This  method  retains  the  separability  of  the  original 
Prony  method  and  so  is  called  the  iterative  Prony  method.  It  differs  from  maximum 
likelihood  in  that  there  is  no  stochastic  criterion  and  it  does  not  attempt  to  solve  for  the 
AR  and  MA  parameters  simultaneously.  The  method  is  developed  below. 

3      An  Iterative  Prony  Algorithm 
3.1      Iterative  Algorithms 

A  multidimensional  function  Q{rl,  r2,  •  •  • .  rP)  that  is  continuous  and  differentiate  can  be 
minimized  using  one  of  several  very  powerful  optimization  techniques  known  as  gradient 
methods  [12].  Some  of  those  methods  are  derived  on  the  basis  of  a  quadratic  model  that 
can  be  obtained  from  a  truncated  Taylor  series  expansion  of  Q(r).  Let  r' k>  denote  the 
value  of  r  at  the  kth  iteration.  Then  for  any  point  r  =  r(fc'  +  6  when  6  is  small,  the 
function  can  be  approximated  by 


Q(t^  +  *)  «  qW(6)  =  QW  +  g{k)TS  +  -6TG^S 


(9) 


where  g  and  G  represent  the  vector  of  first  derivatives  (gradient)  and  the  matrix  of  second 
derivatives  (Hessian)  of  the  function  Q(r)  and  they  should  be  available  at  every  point. 
In  Newton's  method  the  iterate  r(;r+1)  is  taken  to  be  r(/c*  +  6*  \  where  the  correction 
S(  '  minimizes  q^k'(S).  This  method  is  only  well  defined  when  the  Hessian  matrix  G 
is  positive  definite,  in  which  case  the  k  iteration  of  Newton's  method  is  given  by  the 
following  procedure  [13]: 

1.  solve  G^6  =  -g'fc)     for    6  =  6{k) 

2.  setr(*+1>  =  r<;c>+£('c)  (10) 

The  fact  that  G^fc)  may  not  be  positive  definite  when  x^  is  far  from  the  solution,  and 
that  even  when  G^  is  positive  definite  convergence  may  not  occur,  makes  this  method 
undesirable  as  a  general  formulation  of  a  minimization  algorithm. 

Since  Newton's  method  is  defined  only  when  the  matrix  G(*'  is  positive  definite,  and 
this  matrix  is  positive  definite  only  when  the  error  S  is  ''small",  we  can  say  that  the 

6 


method  is  defined  only  in  some  neighborhood  f2(^  of  r(fc)  in  which  <?(fc)(*)  agrees  with 
Q(r^  +  S)  in  some  sense.  In  such  cases,  it  is  correct  to  choose  r^+1^  =  r(fc)  +  6(  ',  with 
the  correction  6^  *  minimizing  q^Hti)  for  all  r(*'  +  6  in  f2(/c).  This  method  is  referred  to 
as  the  restricted  step  method  because  the  step  is  restricted  by  the  region  of  validity  of  the 
Taylor  series  [13].  The  region  of  definition  for  the  k     iteration  can  then  be  expressed  as 

Q{k)  =  {r:  ||r-r(fc)||    <  h{k)}  (11) 

where  ||  •  ||  denotes  the  norm  of  the  vector.  In  this  case,  the  optimization  problem  can 
be  stated  as: 

minimize^  q(k){6)  subject  to  ||6||    <  h(k)  (12) 

As  mentioned  before,  the  least  squares  norm  ||  •  ||2  is  the  one  most  commonly  used  in 
these  type  of  problems  so  it  is  the  one  used  in  this  paper.  The  problem  that  now  becomes 
apparent  is  how  to  select  the  error  margin  h(;c)  of  the  neighborhood  (11).  This  margin 
should  be  as  large  as  possible  because  the  iteration  step  is  directly  related  to  it.  Various 
methods  have  been  proposed  to  control  the  parameter  h(/c).  One  of  these  methods  [13] 
attempts  to  insure  that  the  Newton's  minimization  problem  (10)  is  always  defined  by 
adding  a  multiple  of  the  unit  matrix  I  to  G(  *  and  computing  the  new  problem 

(GW  +  itf)*W  =  -gW  (13) 

where  the  net  effect  is  that  increases  in  u  cause  ||6||2  to  decrease,  and  vice  versa. 

If  we  define 

A0(k)        0(k)  _  Qir(k)  +  6(k)) 

o{  '  =  — - —  =  — — (14) 

then  the  ratio  p(k^  represents  a  measure  of  the  accuracy  to  which  q^{6^)  approximates 
Q(r^  +  6^  ')  on  the  A:th  step,  and  as  the  accuracy  increases  p^  gets  closer  to  unity. 
Using  (14),  Marquardt  [14]  suggested  an  algorithm  that  tries  to  adaptively  maintain  h^fc) 
as  large  as  possible  while  controlling  the  ratio  p(fc'.  The  kth  iteration  of  such  an  algorithm 
is  stated  as 

1.  given  r{k)  and  v{k).  calculate  g{k)  and  G{k): 

2.  factor  G(fc)  +  i/(fc)I;  if  not  positive  definite,  reset  u(k)  =  4i/(fc)  and  repeat; 

3.  solve  (13)  to  find  6(k): 

4.  evaluate  Q(r{k)  +  6{k))  and  hence  p{k): 


5.  if  p(k)    <  0.25  set  i/<*+1)  =  4*/(/c) 

else  if  p{k)   >   0.75  set  u^k+l^  =  i/<*>/2 
else  set  u^k+l)  =  i/(/c>; 

6.  if  p(k)    <  0  set  r(fc+1)  =  r<*>  else  set  r(/c+1>  =  r(k)  +  6 


(*) 


Here  the  parameters  0.25,  0.75,  4  and  2  are  arbitrary  and  v^  >  0  is  also  chosen  arbi- 
trarily. Proofs  of  global  and  second  order  convergence  for  this  algorithm  are  given  in  [13] 
for  the  cases  when  the  first  and  second  derivatives  of  the  function  Q(r)  exist,  and  the 
vector  r(/c)  belongs  to  a  bounded  rc-dimensional  space  for  all  k.  Although  this  method 
does  have  some  disadvantages,  it  represents  a  good  basis  for  the  formulation  of  a  general 
minimization  algorithm. 


3.2      Computation  of  Derivatives 

Let  us  now  return  to  the  problem  of  representing  a  sequence  x[n]  as  a  linear  combination 
of  complex  exponentials.  Equation  6  can  be  written  as 


Re  =  x-f 


(15) 


where  e  is  the  equation  error,  x  represents  the  data,  which  may  or  may  not  be  complex, 
c  is  the  vector  of  complex  coefficients,  and  R  is  the  matrix  of  complex  roots,  which  can 
be  written  in  terms  of  real  and  imaginary  parts  as 


R  = 


1 
rfl,     +    jrh 
{rRl     +    jrh)2 
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tr2     +    jri3 
(rR2     +    jrh)' 
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fRP      +  3rIP 
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(16) 


By  defining 


<2(r)  =  ||e||a  =  e'Te  =  (Re  -  x)'T  (Re  -  x) 


(17) 


it  is  clear  that  the  problem  is  to  find  the  vector  r  =  r/?  +  jrj  of  P  complex  roots  that 
minimizes  the  function  Q(r). 


To  make  the  following  development  less  cumbersome  let  us  define  the  gradient  oper- 
ator with  respect  to  the  real  and  imaginary  parts  of  the  roots  as 


VrdM 
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R2 


0'Rp 
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on, 


(18) 


Then  the  gradient  vector  g  of  first  derivatives  and  the  Hessian  matrix  G  of  second 
derivatives  can  be  written  compactly  as 


g  =  VrQ  = 


^TRQ 

Vr;<2 


and 


G  =  Vr   (VrQ 


Vr/J(VrRQ)T    VrR(Vr/Q 


,T  1 


(19) 


(20) 


Vr,(VrjlQ)J     Vr;(Vr;Q)J 

Note  that  it  is  essential  to  work  with  the  derivatives  involving  the  real  and  imaginary 
parts  of  the  roots  rather  than  the  complex  roots  themselves  because  Q{r)  in  (17)  is  not 

an  analytic  function  of  the  complex  variables  r^.  r2 rp.  It  is  shown  in  the  appendix 

that  the  upper  and  lower  partitions  of  g  are  given  by 

VrfiQ  =  2Re{F-Te}  (21) 

and 

Vr/Q  =  2Im{F*Te}  (22) 

where  F  is  a  matrix  with  columns  f,  defined  bv 
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Further,  if  S  is  defined  as  the  matrix  with  columns 


0 
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2 

6r, 


def 

s,  = 


Ur) 


c, 


(24) 


then  it  can  be  shown  (see  appendix)  that  the  partitions  of  G  are  given  by 

Vrfl(Vrfi<2)T  =  2Re  {F*TF  +  diag  (S*Te)} 

Vrfl(Vr/Q)T  =  21m  {F'TF  -  diag  (S'Te)} 

Vr/(Vr/iQ)r  =  -21m  {F-F  -  diag  (S'Te)} 

Vr;(Vr;c2)T  =  2Re  {F*rF  -  diag  (S'Te)} 


(25) 

(26) 
(27) 
(28) 


where  the  notation  ''diag  "  applied  to  a  vector  represents  the  operation  of  forming  a 
diagonal  matrix  whose  components  are  the  components  of  the  vector.  Thus  the  gradient 
g  and  Hessian  G  are  convenient  to  compute. 

3.3      Real  Axis  Poles  and  Zeros 

Some  discussion  is  necessary  about  how  the  iterative  Prony  algorithm  handles  poles  and 
zeros  on  the  real  axis.  When  such  poles  and  zeros  occur,  the  imaginary  parts  and  all  of 
the  associated  derivatives  are  zero.  Thus  poles  and  zeros  initially  on  the  real  axis  remain 
on  the  real  axis  through  all  successive  iterations.  This  can  be  a  practical  disadvantage 
because  the  original  Prony  algorithm,  which  is  used  to  produce  the  starting  configuration 
of  poles  and  zeros,  can  sometimes  place  a  pair  of  poles  or  zeros  on  the  real  axis,  while  in 
fact  this  pair  really  belongs  off  of  the  real  axis  in  a  conjugate  symmetric  configuration. 
In  this  situation  the  method  so  far  described  will  never  reach  the  true  configuration.  A 
modification  is  therefore  introduced  to  deliberately  displace  those  roots  from  the  real 
axis.  The  algorithm  places  the  roots  in  a  conjugate  symmetric  position  by  calculating 
the  mean  of  their  values  and  adding  to  them  a  small  arbitrary  offset  in  the  imaginary 
direction,  and  proceeds  with  the  iterations.  If  the  tendency  of  the  roots  is  to  ''go  back'1  to 
the  real  axis  then  they  are  returned  to  their  initial  position  and  the  iterations  continue. 
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Figure  2:  Displacement  of  the  poles  and  zeros  of  an  order  (4,3)  model;  the  modeled  signal 
in  this  case  actually  has  two  poles  on  the  real  axis. 

Otherwise  the  roots  may  continue  to  spread  apart  and  move  as  a  complex  pair.  Figure 
2  is  an  example  of  the  displacement  of  the  poles  and  zeros  of  an  order  (P.Q)  =  (4,3) 
model.  In  this  case  the  modeled  signal  actually  has  two  poles  on  the  real  axis  and  the 
initial  model  correctly  placed  two  of  the  poles  on  the  real  axis.  Those  poles  are  displaced 
from  the  real  axis  by  an  amount  of  approximately  0.3  in  the  imaginary  direction  by  the 
algorithm,  but  then  after  some  iterations  it  is  clear  that  the  poles  are  tending  to  return 
to  the  real  axis.  At  this  point  the  poles  are  returned  to  the  real  axis  by  setting  their 
imaginary  parts  to  zero  and  the  iterations  continue  until  convergence  is  obtained.  The 
opposite  situation  is  shown  in  Figure  3.  In  this  case  the  initial  model  also  has  two  poles 
located  on  the  real  axis,  but  contrary  to  the  case  presented  above,  the  roots,  after  being 
displaced  from  the  real  axis,  (again  by  an  amount  of  0.3),  continue  to  move  away  from 
the  axis  until  they  reach  their  final  position  in  the  first  and  fourth  quadrants  closer  to 
the  unit  circle. 
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Figure  3:  Displacement  of  the  poles  and  zeros  of  an  order  (4.3)  model.  The  initial  model 
shown  has  poles  on  the  real  axis;  the  final  model  in  this  case  has  those  poles  moved  to 
conjugate  symmetric  locations. 
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4     Performance  Comparison 

The  iterative  Prony  method  has  been  tested  on  a  large  variety  of  data  including  both 
simulated  and  recorded  sonar  and  speech  data.  In  testing,  we  compared  the  performance 
of  the  new  iterative  Prony  algorithm  to  that  of  the  iterative  prefiltering  algorithm,  which 
previously  had  been  found  to  be  the  algorithm  most  suitable  for  our  work  in  waveform 
modeling  of  acoustic  data  in  the  signal  domain  [5].  In  tests  on  simulated  data,  we  found 
that  when  the  correct  order  of  the  model  is  known  and  it  is  used  to  model  the  data,  then 
both  iterative  algorithms  tested  produced  very  good  models.  However,  when  given  an 
order  higher  than  the  true  order  of  the  signal,  the  iterative  prefiltering  algorithm  tended 
to  oscillate  and  took  longer  to  reach  convergence.  On  the  other  hand,  in  most  cases  the 
performance  of  the  iterative  Prony  method  was  not  affected  by  this  change  in  the  order 
of  the  model.  The  final  model  would  simply  have  extra  poles  and  zeros  positioned  to 
cancel  each  other. 

On  the  real  recorded  data  we  found  that  in  all  cases  the  iterative  Prony  method 
provides  much  better  models  than  the  regular  (non-iterative)  Prony  method.  In  man}' 
cases,  the  performance  of  the  iterative  prefiltering  method  was  comparable  to  that  of 
the  iterative  Prony  method,  but  in  a  large  number  of  cases  the  error  produced  by  the 
iterative  prefiltering  algorithm  was  highly  oscillatory.  The  same  number  of  iterations 
was  used  for  both  the  iterative  Prony  and  the  iterative  prefiltering  methods  in  all  of  the 
tests.  In  order  to  obtain  the  best  possible  results  from  the  iterative  prefiltering  algorithm 
(especially  in  those  cases  when  there  was  no  convergence),  the  model  selected  was  that 
corresponding  to  the  iteration  with  the  lowest  error  between  the  model  and  the  original 
sequence. 

Figure  4  shows  the  results  of  modeling  a  100-point  segment  (approximately  8  ms) 
of  underwater  recorded  data  (ice  cracking).  The  signal  was  modeled  using  the  signal 
domain  Prony  method  of  section  2  and  the  two  iterative  algorithms  mentioned  above.  A 
model  of  order  (12,11)  was  used  in  all  cases.  It  is  clear  that  the  models  produced  by  the 
iterative  methods  (Fig.  4(b)  and  (c))  represent  the  original  signal  much  more  faithfully 
than  the  model  produced  by  the  non-iterative  method  (Fig  4(a)).  It  can  also  be  seen  that 
the  model  produced  by  the  iterative  Prony  method  provides  a  much  better  representation 
of  the  original  signal  than  the  model  produced  using  the  iterative  prefiltering  algorithm. 
Figure  5  shows  the  behavior  of  the  sum  of  squared  errors  at  each  iteration  between  the 
original  signal  and  the  models  produced  by  the  two  iterative  methods  compared  here. 
The  oscillating  behavior  of  the  error  when  using  the  iterative  prefiltering  algorithm  was 
found  to  be  one  of  the  main  disadvantages  of  using  this  method.  Because  the  iterative 
Prony  method  is  based  in  minimizing  the  error  between  the  model  and  the  original  signal 
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Figure  4:  Segment  of  recorded  underwater  sound  corresponding  to  ice  cracking  and  three 
ARMA  models,  (a)  Non-iterative  Prony  model,  (b)  Iterative  prefiltering  model,  (c) 
Iterative  Prony  model. 
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by  directly  adjusting  critical  parameters,  the  behavior  of  the  error  tends  to  be  smoother 
and  to  decrease  monotonically  toward  convergence  with  increasing  number  of  iterations. 
In  this  case  the  error  rises  slightly  at  the  third  iteration  but  then  decreases  monotonically 
for  the  remainder  of  the  test. 

5      Conclusions 

ARMA  models  can  produce  very  accurate  reproductions  of  short  segments  of  acoustic 
data  but  the  accuracy  depends  critically  on  the  method  used  to  find  the  model  parame- 
ters. Frequently  nonminimum-phase  models  are  required  to  match  the  data  in  the  signal 
domain.  Prony's  method  is  convenient  to  use  for  ARMA  modeling  because  it  finds  the  AR 
and  MA  parameters  separately  by  solving  linear  equations.  However  the  model  provided 
by  Prony's  method  is  frequently  suboptimal.  The  iterative  Prony  method  described  in 
this  paper  results  in  a  significant  performance  improvement  over  the  basic  Prony  method 
in  practical  problems  while  retaining  the  separability  characterizing  the  original  method. 
The  new  method  is  based  on  iteratively  moving  the  poles  of  the  model  in  directions  that 
tend  to  decrease  the  error  between  the  original  and  modeled  data  and  has  been  found 
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experimentally  to  have  excellent  convergence  properties. 

The  new  method  has  also  been  compared  with  the  iterative  prefiltering  algorithm  of 
Steiglitz  and  McBride  [7].  Neither  method  is  guaranteed  to  have  monotonic  convergence 
of  the  error,  but  in  many  cases  of  testing  on  real  and  simulated  data,  where  the  true  model 
order  was  not  known,  the  iterative  prefiltering  algorithm  had  an  oscillatory  behavior 
while  the  iterative  Prony  method  showed  monotonic  or  near  monotonic  convergence.  The 
computational  requirements  of  the  two  algorithms  were  compared  by  testing  and  counting 
floating  point  operations  within  the  MATLAB  implementation.  For  a  complex  data  set, 
the  computational  requirements  for  the  iterative  Prony  method  are  approximately 

672P3  +  (24Ars  +  102)P2  +  (60Ars  +  46)P  +  198Ars 

where  P  is  the  number  of  poles  and  ATS  is  the  length  of  the  data,  while  those  for  the 
iterative  prefiltering  algorithm  are 

64(P  +  Q  -  l)3  +  8NS(P  +  Q)2  +  10(P  +  Q)NS  +  12Ars, 

where  Q  is  the  number  of  zeros.  In  typical  problems  of  models  in  the  range  of  8  to  16 
poles  and  zeros  the  iterative  Prony  method  requires  about  25%  more  operations  than 
the  iterative  prefiltering  method.  However  the  likelihood  of  the  iterative  Prony  method 
to  reach  convergence  in  practical  applications  can  result  in  running  the  algorithm  for  a 
fewer  total  number  of  iterations  and  thus  an  overall  decrease  in  the  total  computation. 
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A     Appendix 

A.l      Gradient  vector  g 

From  (15)  it  is  easy  to  see  that 


dR 


de" 


— -  =  — c    and 

orx        orx  orx 


=  c 


•T 


dR- 


(29) 


where  rt  here  represents  the  real  or  imaginary  part  of  the  ith  root.   Using  (29)  and  the 
chain  rule  in  (17)  leads  to 


dQ         mT  dR 


drRt 


=   € 


drRt 


c  +  c 


■^R-T«=2Re 


drRi 


,TdR 


ch 


R, 


(30) 


and  explicit  evaluation  of  the  partial  derivative  of  the  matrix  R  results  in 


dR 

drRt 


•c  = 


0 

1 

2(rR,.+jr/,) 
3(rn,  +jrIi)' 


.  {Ns-l){r^  +jrIt 


*A'..-2 


def  r 
C,    =   I, 


(31) 


where  c,  is  the  ith  component  of  the  vector  c.    Then,  using  (31),  equation  30  can  be 
written  as 

!^  =  2Re[€-Tft]=2Re[f;Te]  (32) 

orRi 

Finally,  using  (32)  for  i  =  1  •  •  •  P,  the  upper  partition  of  (19)  becomes 


Vr„Q  =  2IW 


/ 

'  f-  - 

' 

f- 

6     > 

I 

J'pT . 

j 

(33) 


which  is  the  same  as  (21). 

A  similar  procedure  can  be  used  to  obtain  the  gradient  of  Q  with  respect  to  the 
imaginary  part  of  the  vector  r.  Once  more,  from  (29)  and  the  chain  rule  applied  to  (17) 
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it  follows  that 


dR 

drIx 

dRmT 
drJt 


c     =     jfi 


Thus 


and  (22)  follows. 


dQ 
drIx 


=  ]e'TU-3i:Te=2lm{f;Te) 


(34) 
(35) 

(36) 


A. 2      Hessian  matrix  G 

An  expression  for  the  Hessian  matrix  G  can  be  obtained  as  follows.   Using  a  somewhat 
more  convenient  notation  (20)  can  be  rewritten  as 


a 

drHk 

2=1... 

p 

a 

drjk 

dQ 

*SL  1 

k  =  1,. 

..,p 

G  = 


then  substituting  (32)  and  (36)  into  (37)  yields 

£-  (  [e'Tfi  +  f^e]    j  [e*rft- -  f^c]  ) 

4-([e-ft  +  ^]     j[e-ft-f-e]) 
Now  using  the  chain  rule  and  both  expressions  in  (29)  leads  to 


(37) 


G  = 


»  =  1 P 

fc  =  l,...,P. 


(38) 


G    = 


e-r  Jfi.  +  c-raR^f.  +  f.r|Rc  +  ffil€ 


»r  af 


r9R,rf         f  .r  3R  .        9f*T 


drlk        c       ar/fc     •         t     drIkC       ~dt~t 


i  =  l,...,P;     fc  =  l,...,P 


(39) 
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and  from  the  definition  of  f,  in  (31)  it  is  seen  that 

0 
0 
2 


drRk 


6(rHl  +;>/.) 
l2(rK+jrIt)2 

(Ns-l)(N,-2)(rRi+jrIf>-:i 


c,    if  i=k 


(40) 


0 


otherwise 


Thus  if  s,  is  defined  by  (24)  it  is  seen  that 

ft 

drR. 


=  &i6il 


where  8{k  is  the  dirac  delta.  In  the  same  way  it  can  be  shown  that 


drlk 


=  js{6u 


(41) 


(42) 


These  last  two  expressions  together  with  (31), (34).  and  (35)  can  now  be  substituted  in 
(39)  to  obtain 

"     [e*TSt6tk  +  f^ti  +  £*&  +  S-Te6th}       J  [e'TSt6lk  +  f^  -  f^f*  -  S'Te6ik]    ' 

G    = 

.  ;  [e'Tst6th  -  f;Tft  +  ft'Tfk  -  s'JeSlk]     -  [e*TSi6ik  -  f;rf,  -  f*rf,  +  s'Je6lk] 

i  =  l,...,P;     k=l,....P.  (43) 

Since  the  elements  of  the  matrix  G  are  formed  by  additions  and  subtractions  of  complex 
scalars  with  their  respective  complex  conjugates,  an  alternate  expression  for  G  is 

'    2Re  [f-rffc]  +  2Re [s*r^*]      21m [f*rfj  -  21m  {s'tTeStk] 
G    = 

.  -21m  [f-rf,]  +  21m  [sfe6ik]    2Re  [f^f*]  -  2Re  \s'JeStk] 

i  =  l,...,P;     fc  =  l,...,P.  (44) 

Finally,  since  the  notation  in  (44)  specifies  the  ikth  element  of  each  partition,  we  notice 
that  it  is  equivalent  to  (25)  through  (28). 
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